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A SIXTH-ORDER  ACCURATE  DIRECT  SOLVER  FOR  THE 
POISSON  AND  HELMHOLTZ  EQUATIONS 


Patrick  J.  Roache  * 

Ecodynamics  Research  Associates,  Inc. 

Albuquerque,  New  Mexico 

INTRODUCTION 

The  Poisson  and  Helmholtz  equations  are  some  of  the  most  conmon  and 
important  equations  of  mathematical  physics.  Their  solution  is  required  in 
subsonic  aerodynamics  .problems,  in  steady-state  heat  conduction  problems, 
and  in  electric  field  calculations,  as  well  as  other  areas.  While  iterative 
methods  are  easy  to  code  and  are  flexible,  they  are  notoriously  slow  in 
converging  for  large  meshes.  Beginning  with  Hockney's  paper1  in  1965, 
direct  (non-iterative)  methods  are  now  frequently  used  because  of  their  great 
speed  advantage.  A complete  review  of  direct  methods  would  require  a large  review 
paper,  since  this  field  has  become  very  active  in  recent  years.  Almost  all 
the  methods  published  to  date  are  limited  to  second-order  accuracy,  i.e.  the 
truncation  error  Is  0(h2)  where  h Is  the  mesh  spacing.  An  exception  is 
Ref.  2,  which  presents  a method  that  is  0(h4)  but  is  limited  to  the 
constant-coefficient  equation  in  cartesian  coordinates. 

In  the  present  paper,  a direct  solver  is  described  for  a class  of 
separable  elliptic  equations  in  (r,e)  coordinates  which  is  0(h6)  accurate 
provided  that  the  problem  is  periodic  In  e.  As  in  Hockney's  method  , a 
discrete  Fourier  transform  is  used  in  e.  In  r,  the  Hermlte  6 discretization 
of  Rubin  and  Khosla3  is  used,  giving  0(ar«)  accuracy  at  interior  points. 

The  basic  equation  used  at  boundary  points  is  0(ar4)  but  is  corrected  in  a 
Index  category:  Computational  Methods. 

* Member,  AIAA. 
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single  deferred  correction  step  to  0(h®)  accuracy. 

THE  METHOO 

The  following  elliptic  equation  Is  solved  directly,  with  periodic 
boundary  conditions  in  e and  combinations  of  Dirichlet  (function)  and 
Neumann  (gradient)  conditions  In  r.  Subscripts  Indicate  partial  derivatives. 

a(r)*rr  + b(r)*r  + c(r)*  + * d(r,e)  (1) 

With  the  proper  choice  of  a-d,  this  equation  can  represent  the  Poisson 
or  Helmholtz  equation  in  cartesian,  polar,  elliptic,  biaxial,  or  parabolic 
cylinder  coordinates,  including  an  arbitrary  coordinate  transformation  in 
r.  The  only  requirement,  which  will  be  Imposed  by  the  use  of  the  Fast 
Fourier  Transform  (FFT)  in  e,  is  that  the  coefficients  be  separable  and  the 
e-term  Is  simply  * . (An  arbitrary  constant  can  be  absorbed  by  scaling, 
of  course.)  Other  coordinates  may  not  meet  these  requl rements , e.g.  spherical, 
prolate  spheroidal,  oblate  spheroidal,  toroidal,  bipolar,  and  paraboloidal^. 

An  important  system  which  does  meet  these  requirements  is  the  conformal 
aaoidinate  system  for  Joukowski  airfoils  used  by  Mehta®  to  approximate  real 
airfoi  Is. 

The  discrete  Fourier  transform  Is  applied  to  (1)  using  the  FFT  algorithm. 

This  transforms  the  variable  *(r,e)  to  i>(r,k)  where  k is  the  wave-number  in 
the  e-direction.  As  in  Ref.  1,  this  gives 

a(r)*rr  + b(r)i|>r  + (c(r)  - kZ/4>*  * d(r,k)  (2) 

where  d(r,k)  is  the  discrete  Fourier  transform  of  d(r,e).  Eqn.  (2)  is  now 
a decoupled  system  of  ordinary  differential  equations  in  r (with  k as  a parameter) 
which  is  solved  by  Hermite  6 discretization®.  A final  backward  FFT  then  gives  p. 

In  the  notation  of  Ref.  3,  Eqn.  (2)  is 
re-written  with  u a i,  m ; ur,  and  M = urr>  as 

AM  + Bm  + Cu  * D (3) 

where  A-D  are  obvious  shorter  notations  for  the  coefficients  in  Eqn.  (2). 

The  Hermite  6 discretization  provides  two  more  equations,  as  follows. 
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The  index  j is  the  index  for  the  discrete  node  points  in  r,  equally  spaced  h apart. 

24 


Vl  * 8M3  + Vl 


k*  (Vi 


2 u3  + UM> 


+ F <Vl  ' Vl5 


7 Vl  + 16  m.  ♦ Tm^  - f (Uj+1  - u^)  ♦ h(Mj+1  - Hj.j) 

Eqns.  (3),  (4)  and  (5)  are  three  equations  in  the  three  unknowns  .u.m.M. 
Since  Eqn.  (3)  applies  at  all  points,  we  can  write  it  at  index  values  j and 
j±l  and  use  it  to  eliminate  all  M's  from  Eqns.  (4)  and  (5),  giving 


(4) 

(5) 


+ ^ mj-1  + (8  ^ ' (* + ^ Vi 


8* u 


3+l 


(6) 


- (h.^  — - 7)  m.  . + 16m 

Vl  J 


J+l 


7)  m 


J+l 


,_JVl+hVl 

Aj-1  Aj+1 


(7) 


If  boundary  values  of  u and  m are  known,  Eqns.  (6)  and  (7)  constitute  a 
2x2  block-trl diagonal  system  of  linear  equations.  However, the  boundary  values 
of  both  u and  m cannot  be  given,  since  that  would  overspecify  the  continuum 
problem.  The  difficulty  in  most  high-order  accuracy  methods  is  to  find 
a consistently  high-order  boundary  formulation.  If  conventional  one-sided 
differences  are  used  to  evaluate  m at  boundaries,  the  band-width  of  the 
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If  G.  and  G,,  are  known,  the  system  of  equations  (6),  (7),  (10)  and  (11) 
1 JL 

constitute  a 2x2  block-trldlagonal  system  which  Is  an  0(h6)  approximation  to 
Eqn.  (2).  Since  G1  and  GJL  are  actually  unknown,  we  obtain  the  solution 
by  deferred  corrections  as  follows. 

Using  the  bi-tridiagonal  algorithm  of  von  Rosenberg6  or  another  banded 
matrix  solver,  we  obtain  a first  solution  to  the  system  with  Gj  * GJL  - 0. 
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From  Eqns.  (8)  and  (9),  It  Is  clear  that  this  gives  an  initial  solution  u° 
which  is  formally  Ojh1*)  accurate.  Based  on  u°,  we  evaluate  G1  and  GJL  using 

conventional  one-sided  7-point  differences^  for  uv  and  uv1  as  follows. 


h5(uv)1  * (-21  ux  + 120  u2  - 285  U3  + 360  u„  - 255  ug 
+ 96  Ug  -15  u2)/6 

hS(uV)JL*  (15  uJL-6  * 96  UJL-5  + 255  uJL-4  * 360  uJL-3 
+ 285  uJL-2  * 120  UJL- 1 + 21  UJL)/6 


(13a) 


(13b) 


h6(uvi)1  ■ Uj  - 6 u2  + 15  ug  -20  u4  + 15  ug  - 6 Ug  + u7 


(13c) 


uJL-6  * 6 UJL-5  + 15  uJL-4  ‘ 20  UJL-3  + 15  UJ 


(13d) 


These  values  are  used  in  Eqns.  (10)  and  (11)  snd  the  solution  Is  repeated. 

The  process  can  be  continued  until  convergence.  However,  since  uv  from  Eqns. 
(13a, b)  are  already  0(h2)  accurate  and  uv’  from  Eqns.  (13c, d)  are  already 
0(h)  accurate,  it  is  clear  from  Eqns.  (9-12)  that  the  first  corrected  solution 
to  u is  already  0(h6)  accurate,  so  the  method  with  a single  correction  can 


be  termed  "direct". 

Although  the  requirement  for  a bi-tridi agonal  solution  is  more  expensive 

of  computer  time  than  the  scalar  tridiagonal  solution  required  for  0(h2)  accuracy, 
and  although  the  present  method  requires  a correction  step,  the  operation  count  is 
the  same  order  as  Hockney's  method1.  That  is,  the  operation  count  for  the  FFT's 
with  N nodes  in  e is  0(NlnN)  if  N-l  = 2P,  and  the  two-dimensional  solution 
for  NxN  nodes  is  then  obtained  in  0(N2lnN)  operations.  It  should 

be  noted  that  the  deferred  correction  step  Is  done  within  a solution  for  a Fourier 
component,  so  that  the  FFT  operations  are  not  repeated  for  the  corrections. 


Also,  most  of  the  coefficients  of  the  2x2  system  need  not  be  re-evaluated 
for  the  correction  step(s). 

If  more  than  one  deferred  correction  step  Is  used,  round-off  error 
may  cause  the  Iterations  to  diverge  unless  the  corrections  are  under-relaxed. 
For  one  correction,  we  use 

gJ,  GjL  * function  ( u°)  (14a) 

but  for  more  correction  steps,  we  use 

g",  GjL  * function  ( SjU11'1  + >4  un~2)  (14b) 

NUMERICAL  VERIFICATION 

The  code  incorporating  this  method  has  been  tested  on  a variety  of 
two-dimensional  as  well  as  one-dimensional  problems,  and  is  designed  to 
handle  arbitrary  user-specified  coefficients  a-d  in  Eqn.  (1).  Representative 
results  are  presented  here  for  the  simple  equation 

*rr+  *ee  " d • °ir<2",  ° £ e < 2*  (15) 

for  two  cases, 

d.  * ->ssin(r)  sin(e)  (l6a) 

a 

which  gives  the  solution  *(r,e)  * sin(r)  sin(e),  and 

db  » -ijcosCr)  cos(e)  (16b) 

which  gives  the  solution  ip( r ,e)  ■ cos(r)  cos(e).  These  two  test  cases 
give  different  behavior  because  the  solution  to  (16a)  gives  uv1  * 0 at 
boundaries,  while  Eqn.  (16b)  gives  uv  « uvii  * 0 at  boundaries.  These  are 
useful  for  isolating  the  effect  of  the  deferred  corrections  for  Gj  and  GJL. 

The  numerical  tests  were  performed  on  a time-shared  COC  6600  NOS  System. 
The  computer  has  14+  significant  decimal  figures  of  word  length.  With  only 
mesh  sizes  of  the  form  N - 1 ■ 2P  considered,  the  largest  problem  which  fits 
into  the  34,752  word  central  memory  allocation  is  a 65x65  node  problem. 

The  truncation-error  convergence  for  the  test  cases  is  shown  in  Table  1. 
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For  both  test  problems  (16a)  and  (16b),  the  maximum  error  in  the  solution  E 
is  presented  for  cases  of  no  deferred  correction  (Lc  * 0),  one  deferred 
correction  (Lc  * 1)  and  Lc  * 10,  for  meshes  ranging  from  9x9  to  65x65  (p  * 

3 to  6).  The  maximum  error  E is  seen  to  decrease  very  rapidly,  as  expected.  A 
more  precise  evaluation  is  obtained  by  inspection  of  the  tabulated  values  of 
E/h4,  E/h5,  etc.  In  the  idealized  case  of  no  round-off  error  and  h-*0,  the 
value  of  E/hs  will  become  constant  for  an  s-order  accurate  method  as  h 
decreases. 

Part  (a)  of  Table  1 shows  the  performance  for  the  uncorrected 

method.  Although  formally  only  0(h4)  accurate,  it  is  seen  that  the  actual 
performance  is  closer  to  Q(h®)  since  the  value  of  E/h®  increases  only  from 
3.04x10"^  to  3.86x10"^  as  p varies  from  4 to  6,  for  the  more  difficult  test 
case  (16a).  Part  (b)  of  Table  1 shows  the  performance  for  the  method  with 
a single  deferred  correction.  Formally,  the  method  is  0(h®)  accurate,  and 
this  is  indicated  by  the  values  of  E/h®  which  for  the  worse  case  (16a) 
are  nearly  constant,  increasing  only  2U  as  p increases  from  4 to  6. 

However,  the  single  deferred  correction  is  not  as  accurate  as  a fully  iterated 
solution,  such  as  that  shown  in  Part  (c)  of  Figure  1 for  10  corrections. 

This  shows  E/h®  decreasing  significantly  for  both  test  cases,  indicating 
convergence  at  better  than  0(h6). 

Also  shown  in  Table  1 are  timing  tests,  obtained  on  a 

C0C  6600  NOS  System  using  FORTRAN  IV  with  Level  2 optimization.  The  CPU  time 
shown  is  the  average  of  at  least  three  runs,  and  does  not  include  I/O.  An  optimal 
method  will  have  an  operation  count  for  an  NxN  problem  proportional  to  N2, 

so  that  the  value  of  (CPU  time)/N2  would  be  a constant.  The  present  method 

has  a sub-optimal  operation  count  proportional  to  InN,  so  we  would  expect 

(CPU  time)/N2  to  Increase  with  N.  However,  there  is  evidently  enough  overhead 

operations  in  the  program  to  mask  this  behavior  over  the  range  of  N tested, 

since  (CPU  time)/N2  Increases  only  slightly  with  N in  Table  2.  (If  we  consider 
only  the  unknowns  at  interior  nodes,  (CPU  time/(N-2)2actually  decreases.)  The  time 
difference  between  the  0(h4)  solution  with  no  correction  and  the  0(h6) 


l 
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solutlon  with  one  correction  is  only  about  35%  for  N * 9 and  decreases  slightly 
for  increasing  N. 

DISCUSSION 

A direct  method  for  obtaining  0(1)6)  accurate  solutions  of  a class  of 
separable  elliptic  equations  with  periodic  boundary  conditions  in  one  direction 
has  been  presented  and  verified.  The  method  as  presently  coded  uses  the 

3 

constant-mesh-spacing  version  of  Hermite  6 discretization  of  Rubin  and  Khosla  . 
However,  it  should  be  noted  that  the  Hermite  6 equations  as  given  in  Ref. 3 
are  actually  more  general,  allowing  for  variable  h with  O(h^)  accuracy. 

Shear  layer  resolution  can  be  achieved  In  the  present  code,  since  the  form 
of  Eqn.  (1)  allows  for  arbitrary  coordinate  stretching  in  r;  however,  the 
variable-h  equations  would  be  useful  for  self-adaptive  mesh  solutions.  Another 
advantage  of  the  compact  formulas  such  as  Hermite  6 is  that  both  the  function 
and  its  first  derivative  (u  and  m in  Eqns.  3-11)  appear  explicitly.  It  is 
thus  very  easy  to  set  accurate  boundary  conditions  on  the  gradient,  which 
is  more  difficult  in  conventional  high-order  differencing.  Further,  the  numerical 
oscillations  associated  with  high  cell  Reynolds  numbers  are  mitigated  bv  the 

3 

use  of  compact  forms  . The  advantages  of  high  order  methods  are  well  known, 
provided  that  high  accuracy  Is  indeed  required  for  the  problem  at  hand.  Good 
candidates  for  sixth-order  methods  are  problems  which  Involve  fluid  stability 
calculations.  The  direct  solver  described  above  is  currently  being  used  in 
studies  of  time-dependent  vortex  shedding  from  oscillating  cylinders. 

On  a less  optimistic  consideration,  we  note:  (1)  it  will  be  difficult  to 
actually  realize  0(h6)  accuracy  In  real  aerodynamics  problems  with  ambiguities 
in  far-fleld  computational  boundary  conditions;  (2)  high  formal  accuracy  does 
not  solve  resolution  problems  associated  with  the  thin  shear  layers  of  high 
Reynolds  number  problems  (e.g.,  see  Ref.  8). 
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Part 

Eqn. 

£ 

Mesh 

Lc 

E 

E/h4 

E/h® 

E/h6 

CPU  time 
(sec) 

CPU  time/f|2 

NxN 

(msec) 

a 

1 

16a 

3 

9x9 

0 

.637-03 

. 167-02 

.213-02 

0.037 

0.457 

4 

17x17 

0 

.284-04 

. 119-02 

. 304-02 

0.135 

0.467 

5 

33x33 

0 

. 104-05 

.702-03 

.357-02 

0.519 

0.477 

6 

65x65 

0 

.352-07 

. 379-03 

. 386-02 

2.063 

0.488 

16b 

3 

9x9 

0 

.283-03 

.743-03 

.947-03 

.121-02 

same  as 

Eqn.  16a 

4 

17x17 

0 

.581-05 

.244-03 

.622-03 

. 159-02 

'• 

5 

33x33 

0 

. 104-06 

. 702-04 

. 356-03 

. 182-02 

M 

6 

65x65 

0 

. 174-08 

. 188-04 

.191-03 

. 195-02 

II 

> 

r 

t 

) 

16  a 

3 

9x9 

1 

. 188-03 

.799  -03 

0.050 

0.617 

4 

17x17 

1 

.858-05 

.234  - 02 

0.182 

0.630 

5 

33x33 

1 

. 147-06 

.257-02 

0.694 

0.637 

6 

65x65 

1 

.253-08 

.282-02 

2.737 

0.648 

16b 

3 

9x9 

1 

.953-03 

.406  -02 

same  as 

Eqn.  16a 

4 

17x17 

1 

.938-05 

.256-02 

• 

il 

5 

33x33 

1 

.518-07 

.904-03 

II 

6 

65x65 

1 

.280-09 

.312 -03 

II 

N 

< 

f 

16a 

3 

9x9 

10 

.205-03 

.873-03 

0.174 

2.148 

4 

17x17 

10 

.699-05 

.190-02 

0.609 

2.107 

5 

33x33 

10 

.950-07 

.166-02 

2.276 

2.090 

6 

65x65 

10 

.874-09 

.976-03 

8.805 

2.084 

16b 

3 

9x9 

10 

.897-03 

.382  -02 

same  as 

Eqn.  16a 

4 

17x17 

10 

.872-05 

.238-02 

11 

5 

33x33 

10 

.450-07 

.785-03 

II 

i 

f 

6 

65x65 

10 

.191-09 

.213-03 

II 

Table  1.  Performance  of  the  direct  0(h6)  elliptic  solver.  lc  * number  of  deferred  corrections. 
E * error  of  the  solution,  h = mesh  spacing  in  r and  e,  = 2ir/(N-l).  CPU  times  are  for  a 
FORTRAN  IV  program  run  on  a time-shared  CDC  6600  NOS  System  with  Level  2 optimization.  The 
notation  .637-03  means  0.637x10"^,  etc. 
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